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I. INTRODUCTION 



A graph or network consists of n vertices/nodes Vi with edges (communication hnes) 
connecting them. It can be described by an n x n connectivity matrix C where 
Cij = Cji = number of edges connecting Vi and Vj. 

Even when we allow Cij to be only or 1 - for (dis) connected ViVj, the number of C 
matrices 2"'("'~^)/^ is huge already for moderate n. 

If two matrices differ only by the labelling of the vertices - i.e. by a similarity transfor- 
mation C = U~^CU with U {U^^ = W) effecting the permutation of rows (columns) of C 
- then C and C represent the same graph. 

Since there are n! such permutations the problem of deciding whether the two connectiv- 
ity matrices correspond to the same graph is believed to be of a high degree of difficulty. It 
is equally hard find intrinsic relabelling invariant features, of graphs which characterize all 
graphs. Even if not achieving this goal, such intrinsic features may be most valuable. Thus 
the characteristic polynomial or eigen-value (Ai . . . A„) of the connectivity matrix encode 
many important graph theoretic features|l[]. 

For most applications a complete characterization of graphs/networks is redundant. We 
are often interested in the "Big picture" or gross features. These include the answers to the 
following general questions about the graph/network: 

Qi: "Are there some groups of vertices which are relatively strongly interconnected and 
more weakly connected to the rest of the "external vertices" ? " 

We will refer to these groups as "clusters in graph." Clearly these differ from the graph 
theoretic "cliques" defined by requiring that each vertex in the clique be connected to all 
other vertices in the clique with no reference to the extent of external connections. 

Q2: "Are there groups of vertices which are "distant" from each other in the sense that 
there are no (or few) "short paths" connecting them?" ("Short paths" are those with a 
small number of consecutive links.) 

Ideally we would like to view a complex graph as a smaller set of (A; ^ n) of "super 
vertices" each having a specific internal structure. By connecting to other super vertices, 
these form a "super graph" at a higher level. 

The shear number of graphs seems to defy such a goal when all graphs are considered. 
We believe however that actual communication, social, commercial, political etc networks 
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are essentially not random. 

The very history of their, often gradual, formation can result in a hierarchial clustering. 
There is often a further tendency to enhance clustering. If Vi and Vj are both strongly 
connected to then Vi and Vj also frequently develop a direct connection. 

Physical constraints such as the three dimensional space we live in and the essentially 
two dimensional surface of the earth or boards of printed circuits also play a crucial role 
along with the need to economize on the total length and usage of communication lines. 

All the above tends to make "clusters in graphs" with relatively loose connections between 
them more likely so that two questions Qi, Q2 above can be answered in the affirmative. 

The following analogue situation in biology is quite instructive. An outstanding, problem 
in post genomic biology is to predict the folding of proteins given their known amino acid 
sequence. While natural "native" proteins almost instantaneously fold into their functional 
three dimensional form, artificially constructed, random, sequences do not. It is believed 
that "building blocks" - and specific "energy landscapes" help guide the system to its correct 
folded form - in nature and in simulations. This is reminiscent of the present problem where 
methods geared to specific "Real Life" networks with a presumed tendency for clustering 
are advantageous. 

How can we efficiently search for such patterns? 

We can ask for the number of paths in the graph of length s connecting a vertex to 
itself or to . By "feehng out" larger and larger region (as s increases) we can tell if Vi 
belongs inside a cluster and if Vi and Vj are distant in the sense described above. We will 
elaborate on a simple approach for achieving this in Section II below. 

Bringing vertices in a "cluster" into close spatial proximity can help in identifying these 
clusters. This can be achieved in a dynamical approach in which we model the vertices Vi 
by moveable point masses at rj(i). Attractive "forces" are postulated between any pair of 
points which are connected in the original graph. Possible implementations of this general 
approach are discussed in Section III. 
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II. THE NUMBER OF RETURNING PATHS AS A TEST FOR "CLUSTERING 
IN GRAPHS" 



Imagine an actual physical model of the network where each edge is replaced by a IQ 
resistor. The electrical resistance between two nodes (or between two groups of vertices 
which are separately shorted) nicely models the "distance'' between these nodes (or the two 
groups) as defined in Q2 above. The laws of adding resistances in series and in parallel 
imply that the resistance, like the "distance", increases the longer the paths on the graph 
connecting the two nodes are, and also decreases with the number of such connecting paths. 
Instead of using this analog computation we can, by using powers of the connectivity matrix 
C, trace out the evolution in s steps of messages sent from each node to all its neighbors. 
In fact the i, j elements of C*'; (C*)ij equals the number of paths comprised of s connected 
edges which start in Vi and terminate in Vj. In particular (C"*)ii is the number of paths 
returning to \^ in s steps. 

When raised to a high power C, like any symmetric real matrix, simplifies considerably. 

— * — * 

Let Ai . . . A„ be the n real eigenvalues of C in descending order and Vi . . .Vn the correspond- 

— * — * 

ing orthonormal n eigenvectors. The columns of become all proportional to Vi with 

— * 

a factor representing the projection of Vi on the i-th column of C: 

{C% oc ■ Vi)Vi (1) 

and likewise for the rows. Upon further multiplication by C, gets then multiplied by Aj. 

For the special case when all vertices in C have the same valency v (i.e. each is connected 
to V others) Xi — v and 

y+ = ^(i,i,...i). (2) 

While we seek some dilution of information such trivialization should be avoided. Useful 
information can be obtained by looking at {C^)ij at moderate values of s. If i belongs in a 
rich heavily connected, "cluster in graph" then the initial rise of {C^)ij : 

{C')ii ~ {vciY for i e duster (3) 

is higher than the initial rise of the same quantity when is a generic vertex located in a 
region of average (v) valency so that: 

iC% ^ (vY (4) 
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with V < Vel- 
io partially avoid the degeneration at high s, and gain more information from C"* for 
large s, we tried adopting the following strategy. Instead of we use 

^2 = - diag (5) 

Since the diagonal of counts all paths which come back to their origins in two steps, these 
paths are omitted in C^. Going one more step we consider ■ C . By subtracting again its 
diagonal elements and defining 

= C^.C- diag & ■ C (6) 

we omit all paths which retrace in three steps, etc. In general we define 

^.+1 ^^^g . ^ 

And {C'^^^)ij is the number of paths from i to j of length s + 1 which have not formed at any 
prior stage a closed loop, and (C"* ■ C)ii is the number of i ^ i such paths. (Such self avoiding 
walks are quite complex even for regular lattices (see, for exapmle recent papers^, and 
are tied to Ising models in the corresponding dimensions. Exact enumerations of such SAW's 
yield solutions of the Ising models but not vice versa, parenthetically we note that if C"^^ -C 
has diagonal elements then the graph in question does have a hamiltonian circuit namely a 
closer path of length n which visits each vertex exactly one time.) 

While the latter number increases much more slowly than {C''^)ii, it still "runs-away" as 
s — s> oo , so that we need to "re-normalize" at each stage to have each {C^)i column 
vector be of unit length. A plot of (C"'"'' ■ C)ii as a function of s could ideally help "map out" 
other clusters in the graph. After "exhausting" all vertices in the putative initial cluster Ci 
in which i resided - which due to self-avoidance will take ni steps with ni the number of 
vertices in Ci - we will wander off into a generic part of the graph. There the slower growth 
rate (^ will take over. If we can reach in di2 steps a second rich cluster C2 we could after 
such number of steps start having again a fast growth rate (^. This continues until n2 steps 
later, C2 is exhausted etc. etc. 

However the graph "between the clusters" is still a network. This causes diffusive migra- 
tion between two clusters with no sharp arrival times. Also for appreciable s several clusters 
may be reached at the same or similar number of steps. These features tend to smooth out 
the changes of (C"')jj. 
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III. DYNAMICAL EVOLUTION HIGHLIGHTING NETWORK STRUCTURE. 



A basic difficulty in discerning intrinsic graph / network structure is that the connectivity 
matrix depends on the labelhng of the vertices. The following example clearly illustrates 
this. Let us assume a large subset of vertices in our graph indeed divide naturally into 
fairly well-defined clusters Ci with rii vertices, C2 with n2 vertices etc up to Ck- If we label 
our vertices in such a way that all vertices belonging in any one cluster are contiguous, the 
connectivity matrix will be "Almost Block Diagonal" . 

This is depicted in Fig.(|^): The rii x ni, ?t,2 x 77-2 sub matrices along the diagonal will then 
be connectivity matrices for the ffist, second, etc cluster. By assumption these matrices have 
a relatively high proportion of non-vanishing elements. The corresponding darker squares 
can thus be visually discerned relative to the background of the lighter more sparse remaining 
parts of the original C matrix. 

This nice feature completely disappears after massive relabelling, i.e. massive joint reshuf- 
flings of columns and rows in the matrix C (Fig.(^). The whole matrix will then have a 
roughly constant average density of unit entries looking uniformly gray. Our goal is essen- 
tially to reconstruct the original, convenient "Almost Block Diagonal" form which exhibits 
the clusters. Its difficulty is exacerbated by the fact that the block diagonalization is only 
approximate and there are many non-vanishing entries outside the blocks. Also we do not 
know a priori which size blocks and how many blocks do exist. The representation of a graph 
by drawing it in two dimensions also introduces undesired arbitartrariness reflected in the 
choice of coordinates {xi,yj) of the points representing the various vertices. Two different 
drawings of the same graph may appear completely different and unrelated. 

Such arbitrariness is particularly harmful when we try to implement the general idea 
described above and introduce attractive forces between any pair of points representing 
a pair of connected vertices. The subsequent motion of the points may depend on their 
arbitrary initial placement. 

To place the n vertices in a completely symmetric and unbiased manner we need to go 
to n — 1 dimensions. The vertices (or the n physical point masses modelling them in our 
approach) can be then put at the n vertices of a symmetric simplex inscribed inside the 
unit sphere in n — 1 dimensions. Specific coordinates of the n vertices can be constructed 
in a simple inductive process indicated in Appendix A. All vertices are equidistant from the 
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FIG. 1: Connectivity matrix with the average cluster valency 20% and inter cluster connectivity 
valency 3%. 

origin; and specifically we chose: 

rf = 1 (8) 
using this,(X] ^i)^ = and the equality - due to symmetry - of all fi ■ rj for any i j readily 



implies: 



1 



n — 1 



all i ^ j i, j = 1 . . .n. 



(9) 



The distance between any pair of vertices of the simplex i.e. between any pair of the 
representative points at the outset of our proposed dynamical simulation is therefore: 



I 2n 
n — 1 



all i 7^ j. 



(10) 
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FIG. 2: Randomly reshuffled connectivity matrix C. 

We next endow our system with some dynamics^]. We introduce a fictitious attractive force 
between points corresponding to vertices which are connected in the initial graph of interest. 
Thus if Cij 7^ we postulate 

F^Ar^,r,) = 0,/(|rl - r.D^^^. (11) 

I ' « 'ij I 

To be the force attracting the point mass i to the point mass j, in the direction of fi — fj. 
To retain the initial symmetry and avoid any biasing we take the same force law /(r) for 
all pairs. The specific shape of /(r) will be tuned to optimize the gradual clustering. In 
general /(r) falls with distance and conversely, grows at short distances. 

The only way information about the specific graph of interest is communicated to our 
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dynamical n body system is via the overall strengths of the forces C,ij. It vanishes if Cij — 0. 
For the generalizations considered later and also to better mimic real networks we allow any 
Qj = Cij > 1 so that it counts the number and "quality" of connections between and Vj. 
We next let our point move according to the standard newtonian dynamics: 



rrii 



= F, = ^F,,-. (12) 



In order to avoid "overshoots" and oscillations we can add damping via viscous frictional 
forces: 

Finally we can adopt the extreme /^j ^ rrii so as to neglect inertial effects and have first 
order "Aristotelian Dynamics": 

The latter is readily discretized for time increments S: 

S - 

fi{t + 5)^fi{t) + —Fi{fi{t)) i^l...n, l^i l^l...n. (15) 

To preserve the initial symmetry we take all initial mass ( and separately all initial viscosities) 
to be equal /li — /i , rrii — m. Different masses (and / or viscosities) will arise at later stages 
when we treat super graphs with heavy vertices representing initial clusters. 
The attractive central forces can be derived from a pair wise potential i.e.: 

m = -^U{r). (16) 

And the overall potential energy is then: 

C/(rl...r-;) = X:Ci,C/(|rl-f,|) (17) 

i>j 

U (r) is assumed to monotonically increase with decreasing r. The possible equilibrium "fixed 
points" of our dynamical system namely those for which 

all i are then stationary points of [/(fi . . . f„). 

With only attractive forces or potentials present our n point system eventually collapses 
towards the origin. This is readily seen as the scaling 

n ^ An (19) 



with A < 1 will obviously decrease the U{fi . . . r^) of equation (|T^ for any set of rj. 

A joint collapse of all n points happening before the vertices belonging to "clusters in the 
graph" have separately concentrated in different regions compromises our goal of identifying 
the latter clusters. 

To avoid the radial collapse we constrain rj(t), at all times to be on the unit sphere: 

\fi{t)\ = constant = 1 all t>0. (20) 
To incorporate this we supplement eq.(^) by a length renormalization: 



to be performed following the operation ( [151) at each step of our evolution. The constraint 
(^) amounts to introducing normal (radial) reaction forces which cancel the radial compo- 
nents of any of the forces Fj, leaving us with only the tangential parts: 

Fj = Fi- (F, ■ rDrl (22) 

The basic conjecture we make is the following: 

"After a sufficiently long time T (or sufficiently many steps s = T/S) has elapsed so 
that any point moved on average an appreciable distance away from its initial location 
|rj(T) — rj(0)| > a ~ 1 geometrical clusters of points tend to form. The points in each 
geometrical cluster correspond, to a good approximation, to the original vertices in a "cluster 
of the graph" which these points represent." 

In the following we motivate this conjecture. 

We recall the definition of a cluster in the graph as a subset Ci of ni vertices with a higher 
than average number of connections between them and than average number of connections 
with external vertices. At t = 0, the points representing any subset of p vertices out of the 



n vertices in the graph reside at the p vertices of a (p — 1) dimensional symmetric simplex. 

/ \ 
n I 

All together there are such "faces" of our original n — 1 dimensional simplex. 

\p) 

To most clearly illustrate our point let us assume an "ideal graph cluster" so that in first 



approximation we completely neglect those forces attracting members of the cluster (more 
precisely point masses representing vertices in the cluster) to "outside" points. Had we also 
omitted the constraint (pO|) then the forces acting between the ni points of the cluster Ci 
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would initially and hence at all subsequent times, be restricted to the corresponding ni — 1 
dimensional face. Repeating the argument made originally for the full set of n vertices, a 
collapse of these ni points into some point inside the ni simplex (i.e. on the n; — 1 dimensional 
face) is guaranteed. With the constraint ( PPD enforced, the set of rii points will still collapse 
but now not to a point on the ni simplex but to a point on the "spherical ni simplex" which 
is the projection of the simplex on the unit sphere. The point of common clustering need 
not be at the geometrical center of this spherical rii — 1 dimensional face. However unless 
the cluster in question is very asymmetric in its internal connections, it may not be too far 
from it. 

Let us next turn on the few forces pulling members of the cluster due to external vertices, 
i.e. points initially residing outside this face. Such pulls may slightly shift the location of 
the clustering point away from the rii — 1 dimensional spherical "face" . It is unlikely that it 
will disrupt completely the clustering of the vertices Vi G Cj belonging in the cluster. 

We believe that the tendency to cluster will persist even in the more general case when 
the clusters are not so sharply defined. 

Let us focus on one particular vertex Vi located at t = at r j , one of the n simplex 

/ \ 
n I 

vertices. Among all the subsets of ni vertices, i.e. ni — 1 dimensional faces, a subset of 

n — 1 \ I n — 1 \ 

shares the specific V^. Stated differently, different rii — 1 dimensional 

^rii-l J yni-l J 

faces do intersect at the Vi considered i.e. n — 1 edges, triangles, (^-'^)('^-'^)('^-^) 

tetrahedrals and so on. Furthermore each of the triangles includes two of the n — 1 edges 



impinging at Vi, every tetrahedra contains three of these edges, etc 

/ 

Let us next assume that among all such 



n — 1 



v 



simplexes there is a particular one 

rii-l ' 

which we denote by 5*^ so that the point in question fi , has a maximal number of forces 
acting in its direction (as compared with the number of forces acting on the direction of any 
one of the other simplexes). This is the reflection in our dynamical model the fact that the 
vertex Vi belongs in a cluster Ci i.e. has more connections to Vj G Ci than to vertices in any 
other subset of rii vertices. 

It is obvious that in the initially symmetric situation the point fi will then start moving 
in the direction of that specific ni — 1 dimensional face since the force in its direction 
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will be maximal. The motion will not be exactly in this hyperplane as Vi may have some 
external connections and consequently there will be forces on the mass point fi in other 
directions. However since the largest force component is along this direction the largest 
initial displacement Si{fi) = fi{6) — fi{0) oc Fi also. This motion will then be the first 
small step towards the formation of the physical cluster of the points representing Q. 

Now at t = all vertices start moving. If all (or most) of the rii vertices on the simplex 
(face) in question share this same feature of Vi then all (or most) of the rii points will tend 
to migrate away from the initial rii vertices of the simplex in question and move toward its 
interior. Once the representative points start to cluster on or near the corresponding rii — 1 
dimensional spherical face the non-linear aspects of the many-body dynamical evolution 
come into play. These will tend to enhance and accelerate the clustering in several ways. 

One such effect is simple. As the group of points start to come closer together the average 
distances |rj — r,/| with (Vi, Vn) G Ci decrease. By assumption the attractive forces between 
them become stronger. This then accelerates the clustering of the points which started to 
cluster. 

A presumably subtler effect is the more coherent pull on "straggling vertices". These 
vertices belong to a strong cluster but due to "Accidental" connections to some different 
group of vertices start moving in a different direction. 

The initial forces acting on any vertex have an angle of 60" between any pair. However 
because of our constraint of staying on the sphere we need to consider only the projection 
on the n — 2 hyperplane tangent to the sphere at the vertex V^, say, in question. After this 
projection the n — 1 edges emanating from Vi span the n — 2 dimensional hyperplane just in 
the same symmetric manner as the n unit vectors span the original n — 1 dimensional n 
simplex. Hence at eq.(^ the angle between members of any pair of the effective tangential 
forces is 

cos [^jj [projected]] = ^- (23) 

Thus if Vi was connected to all the remaining n — 1 vertices in the original graph the sum 
of all the (tangential!) forces acting on it would vanish. In reality the valency of Vi, Vi = 
total number of vertices directly connected to it is, much smaller than n — 1. The almost 
orthogonal Vi forces acting on it will thus tend to add in quadrature. The same a-fortiori 
holds for the fjc, forces directed to the face representing the cluster Ci. ( ViQ is a partial 
i-Ci valency, namely the number of vertices in Ci connected to V)- 
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The initial force component along the rii — 1 dimensional face is then: 

= 0) = E = 0) oc KcJ'/' (24) 

Assume however that after some time most points corresponding to the putative cluster and 
in particular the points in the cluster connected to Vi, have already bunched together 
on the surface of the sphere. The various forces exerted by these ViCi poiiits on Vi will now 
be almost parallel and instead of (|2^) we will have 



Fi{ieQ}{t >to)=Yl PiA^ > to) ^ ^iCr (25) 
jeCi 

The resulting force will be considerably enhanced if f >> 1. 

If the vertices in the original graph had on average small overall valency then f could 



happen to be small - say 0(2 — 3). The y/ViCi enhancement of (|^) relative to (|2^ ) would 
then be minimal. Also Vic, could be smaller than the number of connections that Vi happens 
to have with points in some random face with Hc'i = rii — 1 dimensions. The vertex Vi will 
then "wander off" at t = in the direction of this face rather than that of the "correct" 
face corresponding to the cluster Ci. We can avoid such situations and enhance the coher- 
ence effect discussed above by replacing the original connectivity matrix by an appropriate 
power (C*) where the overall valencies (and in particular valencies pertaining to cluster) are 
(particularly) strongly enhanced. 

Note that an "error" due to an initial wandering off of Vi in the direction of some random 
face which corresponds to no cluster in the graph, is corrected by the very clustering which 
is assumed to occur. The other points in the "random" face will, by assumption, tend 
to migrate out of this face into other faces where these points can more efficiently cluster 
(physically). Finding no nearby cluster on the wrong face the "straying" vertex Vi in question 
is likely to be pulled back into the original cluster Ci (or to another cluster which formed in 
the meantime and to which Vi is more strongly connected). 

Thus our dynamical evolution process can be construed not as just motion of n points on 
the unit n — 1 dimensional sphere, but rather we can view it as a competition between the 
putative different (physical!) clusters for additional members (points). In this ongoing "tug 
of war" clusters with stronger internal connectivity are likely to "win over" farther members 
and form first. 
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Once the points corresponding to a cluster in the graph have "bunched" close together 
they become effectively one dynamical unit - a "supervertex" . Not only will all the points 
pull coherently external points but also the converse naturally holds: the clustered points 
will tend to respond coherently as one dynamical unit to an external force. Thus assume 
that we try to "pull away" one member point. Due to its close proximity to other members 
of the cluster the point in question will strongly pull on those connected to it. The latter 
points in turn will pull on further points in the cluster etc and eventually the whole cluster 
will move in response to the external force. Hence the compact clusters, once formed, will 
be stable with respect to "stray" external "tidal forces" . 

The actual emergence of the physical clusters can be readily ascertained. Once \ fi — fj\ 
is smaller than a prescribed small number the pair of points arc "merged" into one point, 
at {fi + fj)/2. Actually we need at this point to project again (r^ + f^)/2 onto the sphere. 
In further evolution the force acting on the merger point is the sum total of all the forces 
acting on ri and fj . Also the resulting point should be endowed with twice the viscosity 
inertia — /li + /ij and / or m^uj — rrii + rrij. This new, doubled up, point represents a 
new graph derived from the original by identifying Vi and Vj . It has n — 1 vertices and its 
connectivity matrix has the same elements C/// when both //' differ from cither i or j. The 
new vertex (V^uj) is now connected to all the vertices which were connected to either i or 
J- 

We can keep on merging using at each step the center of mass 

r'^,.» ^ 1}fl±^ (26) 

as the merge point. Also we keep adding the masses and viscosities of the merged points 
and keep the connections to all vertices/points presently existing. 

Ideally this process would yield, after a reasonable number of steps s, to k "supervertices" 
corresponding to the k blocks {rii^ x rii^ , rii^ x rii^, . . . ni^, x ni^ } in the properly ordered 
original C matrix of Fig. 1. The off-diagonal element IV will be here the total number of "Z" 
entries in the original matrix C in the riixriir rectangle at the "intersection" of the Ci, and C;/ 
blocks. We could now repeat a similar dynamical procedure for the k "supervertices" . This 
is in fact what the above algorithm is doing anyway in a relatively smooth and continuous 
manner. 

Instead of merging pairs of closely aligned points, we can identify various physical clusters 
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with some minimal number of points and merge those as above. 

The technicahties of how we actually merge the clustered points aside, the key question 
of whether the optimal desired clusters will form to start with, still remains. The dynamical 
evolution described here forms clusters of all sizes - small ones with few members, larger ones 
which may include all or parts of smaller clusters and the one big supercluster containing all 
n vertices. We have blocked, by constraining at all times, the fastest route towards forming 
the overall cluster, namely via radial infall. Clustering will eventually still occur at some 
point the unit sphere. 

In structure formation in three dimensions, creation of small clusters requires the particles 
forming the cluster to travel for shorter distances than in the case of bigger clusters. Due 
to the peculiar geometry of n particles in n — 1 dimensions described in appendix A, this 
intuition does not carry over. Formation of any cluster requires roughly the same distance to 
be covered regardless of the size of the cluster. Hence we are not guaranteed by essentially 
kinematic reasons that the smaller clusters will form first - en route to the bigger clusters - 
which is the desired scenario for our purposes here. 

Careful tuning of the force of /(r) helps achieve such a scenario. For /(r) ~ c/r" 
with large a , small differences in the distances will have large effect, (note that for the 
gravitational force in n — 1 dimensions, a — n — 2). Too strong a rise for r < r initial and fall 
for r > Tinitiai may however lead to accidental clustering of some small groups. In particular 
it may diminish the effect of the corrective mechanism via the coherent pull of the elements 
of the cluster on straying elements described above. 

An interesting alternative prevents complete clustering but allows formation of clusters 
with higher than average internal connectivity. We introduce in addition to the above 
attractive force between vertices Vi and Vj with Cij ^ 0, repulsive forces when Vi and Vj are 
not connected: 

Gij = g{rij) = Fij{r) for Cij = 0. (27) 

\ri + Vj] 

Again this can be derived from a repulsive W{r) potential. 

Since in general we have many more unconnected vertices in a graph with large n, the 
repulsion can be weak relative to the attraction. Let us assume that the average valency is 
V. If all n points would physically cluster we have 0(n^/2) repulsive interactions W{a), with 
a the size of the clustering region, and 0{nv/2) attractive interacter V{a). Thus it suffices 
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to have 

W{a) > V(a) (28) 

in order to prevent forming complete clustering into one big super cluster. (The constraint 
\ri{t)\ = 1 is still necessary to prevent vertices from being pushed to infinity!) 

We note that as a putative new member is trying to join a cluster Ci, in which its 
valency fi{Ci} is higher than the average, we need to facilitate joining, to satisfy the following 
condition: 

W(a) < ^^y(a). (29) 

111 

Since Vi{Ci} — "^i ^^"^ further ni « n, we have a sizeable range of W{a)/V{a) for which 
smaller clusters but not very large ones can first form. By gradually phasing out the repulsive 
forces once the smaller clusters have formed, we can proceed to forming bigger clusters etc. 

The repulsive forces would tend to move to antipodal points on the sphere groups of 
points which are "distant" from each other in the graph theoretic sense of question 2 in the 
introduction. Utilizing such forces would help identify such groups as well. 



IV. SPECIFIC APPLICATIONS 



To demonstrate the power of our approach we applies it to the problem of cluster identi- 
fication in the 100- nodes network represented by the connectivity matrix C of Fig.(|l]). This 
matrix consists of seven clusters with randomly created internal connections with valency 
20%. These clusters have been randomly interconnected with valency 3%. To simulate 
a real-life situation of networks with unknown structure (topology) we randomly permu- 
tate the rows and columns of the matrix C obtaining the reshuffled matrix C shown in 
the Fig.(|^). Next we apply our algorithm for clusters reconstruction using a combination 
of attractive and repulsive forces in n — 1 = 99 dimensional space. The vertices of the 
100-simplex were allowed to move under the influence of the forces on the 9 8- dimensional 
hyper-sphere in 99-dimensions. After a number of steps we analyzed the mutual distances 
between the vertices of the simplex and group neighbors which are close to each other into 
separate clusters. The new cluster-connectivity matrix is shown in Fig.(^. We see the seven 
"big" clusters of the matrix C on a background of few small clusters due to the random (but 
still rather high) cluster inter connections. The procedure identifles not only the cluster 
structure of networks but numerates and tabulates all the nodes in each cluster. We father 
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(Cl)ij 




100 



100 



FIG. 3: Cluster connectivity matrix for reshuffled connectivity matrix C. 



note that the distances in Fig. (§) between the different clusters do - unhke the origin Fi^ 
(HI) - reflect the actual "graph theory" distance between them. 



APPENDIX A 



Some geometrical aspects of the n simplex and its p — 1 dimensional sub-simplex faces 
are relevant to our dynamical evolution. Most such features can be derived without utilizing 
any specific coordinate representation. 

The fundamental relation 

fi ■ fj = ^ all iy^j i,j = l...n (Al) 

n — 1 
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was derived above by using (X^'^i)^ = and symmetry. It allowed us to deduce the length 
of any edge 

/ 2n 

such edges can be viewed as 1-dim 2 point subsimplices. 

/ 



n 



We have also triangles, namely 3-simplices, forming 2-dim "faces" /edges etc, | p- 
simplices etc. 

Let Tp denote the radius of the sphere circumscribing the p simplex and dp the distance to 
its center from the origin (namely the center of the original n simplex). Clearly dp + r^ = 1. 
Let r^i . . . Tip be the p unit vectors of the p simplex. All the ip are different and there are 



/ 

n 



such possible subsets of the n original r^. The vector from the origin to the center of 



simplex is: 

dp = ifi^ + + • • • ^ip)/P (A3) 

Hence using again (|A1|) we find: 



And 



., = ;r3^=^-^.^. (A5) 

Tp is the distance from vertex of the p simplex to its center. Except for very small p's 
(representing "tiny" clusters) all are 0(1) so formation of such clusters would require the 
vertices to travel the same distance as in the formation of bigger clusters. 

The actual angular separation between in the p simplex and r^, the vector from the 
origin to its center is given by: 



TT 71 n — p 

9p = arccos (dp) = - - arcsm (rp) ^ 7- - J tt-. (A6) 

2 2 y (n — l)p 

Two p simplices can differ by just one, two ■ ■ .q, ... or p — 1 points. The distances between 
the centers of two neighboring p simplices differing by q vertices (and with p — q common 
vertices) grow with q for fixed p, as follows. 
The vector connecting the two centers is: 

dp = -{T.r^ -T.ru). (A7) 



p 



=1 i=l 
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With the sets denoting the q points in the first (second) p simphces which are 

not shared by the two. The common r^'s cancel in the difference, and do not contribute to 
the distance r^. Using (^) and ■ fj = —l/{n — 1) we find: 



r 



{4r 



p V n — 1 



(A8) 



Hence the angle between r.^^^ the f,^'^^ vectors to the centers of the two simphces is given by: 



9'^ 
p 



2 arcsin 



2d„ 



2 arcsin 



\ 



n 



n 



p I 2p 



(A9) 



The last equation displays a nice feature. There is a small angular distances between the 
centers of the (spherical) faces corresponding to two p clusters which differ by a small fraction 
q/p oi their vertices. The angular separation grows once g~p<^r;,to6'^~7r/2. 

For our simulations we need an explicit representation of r^. Assume we know the latter 
for the n — 1 simplex (in n — 2 dimension) denote them by t^i . . . dn-i with each d an n — 2 
vector with known components: 



'dj = ^jiii + . . . 'dj^n-2en- 



with ii the unit vector along the l-th axis. When n — 1 ^ n we choose 



(AlO) 



rn 



n 



-e„_i 



l...n-l. 



(All) 



The normalizing factor A„ = yl — l/(n — 1)^ ensures |rj| = 1, given that I'&il = 1. Thus, 
starting with a two simplex with x\ = 1; = —1, we inductively generate any n simplex. 
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